Hospitals in New York: Where

nyc_medicare_2017 <- medicare_part_b_2017 %>%
  filter(`Provider Zip Code` %in% nyc_zip_codes) %>%
  mutate(zip_3 = substr(`Provider Zip Code`, 1, 3),
         borough = ifelse(zip_3 == 104, "Bronx", 
                          ifelse(zip_3 == 112, "Brooklyn", 
                                 ifelse(zip_3 == 100, "Manhattan", 
                                        ifelse(zip_3 == 103, "Staten Island", "Queens")))),
         health_plus = ifelse(`Provider Name` %in% nyc_health_plus, "Y", "N"))
nyc_hospitals_borough <- nyc_medicare_2017 %>%
  group_by(borough) %>%
  summarise(number_hospitals = n_distinct(`Provider Name`))
datatable(nyc_hospitals_borough)

Okay so if we look at this nyc_hospitals_borough, we clearly see that there is a geographical imbalance in terms of the number of hospitals per borough. Queens is probably the most underserved borough in this regard since it is also the biggest borough by land area, right?

## Geocoding Hospitals and NYC
hospital_names <- unlist(unique(nyc_medicare_2017$`Provider Name`))
## Avoid geocoding everytime I knit
# nyc_hospitals_address_geocode <- geocode(location = hospital_names, output = "latlon")
# write_csv(nyc_hospitals_address_geocode, "data/hospitals_lat_long.csv")
nyc_geocode <- geocode("New York City")
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=New+York+City&key=xxx-efzyaf4jU
nyc_lat_long <- nyc_geocode %>%
  unlist() %>%
  unname()
## Load csv MAKE SURE
nyc_hospitals_address_geocode <- read_csv(paste(path, "hospitals_lat_long.csv", sep = ""))
## Parsed with column specification:
## cols(
##   lon = col_double(),
##   lat = col_double()
## )
nyc_hospitals_address <- cbind(hospital_names, nyc_hospitals_address_geocode)

## Mount Sinai Hospital is always wrong 
## Must change both files or it will mess up the conversion to ESRI shapefile
nyc_hospitals_address[4,2:3] <- c(-73.953143, 40.7877519)
nyc_hospitals_address_geocode[4,1:2] <- c(-73.953143, 40.7877519)

## using the sp functions 
# epsg32118_nad83 <- CRS("+proj=lcc +lat_1=41.03333333333333 +lat_2=40.66666666666666 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs")
# nyc_hospitals_spdf <- SpatialPointsDataFrame(coords = nyc_hospitals_address_geocode, data = nyc_hospitals_address, proj4string = epsg32118_nad83)
# writeOGR(obj=nyc_hospitals_spdf, dsn = "nyc_hospitals", layer = "nyc_hospitals", driver = "ESRI Shapefile", overwrite_layer = T)

# calculating catchment area
nyc_hospital_shp$area <- st_area(nyc_hospital_shp)*3861.02
nyc_hospital_shp$area_sqmi <- substr(nyc_hospital_shp$area, 1,5)
borough_list <- nyc_medicare_2017 %>%
  select(`Provider Name`, borough) %>%
  unique()
catchment_area <- substr(nyc_hospital_shp$area,1,5)
catchment_area_df <- cbind(nyc_hospital_shp$hsptl_n,catchment_area) %>%
  as.data.frame() %>%
  arrange(desc(V1)) %>%
  inner_join(.,borough_list, by = c("V1" = "Provider Name"))
## Warning: Column `V1`/`Provider Name` joining factor and character vector,
## coercing into character vector
colnames(catchment_area_df) <- c("Provider Name", "Catchment Area (sq.mi)","Borough")
datatable(catchment_area_df)

Here we can see that the hospitals with the largest catchment areas are in Queens, SI, and the Bronx. Visualizing it in a map, we have the following:

leaflet() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data = nyc_hospital_shp, label = ~area_sqmi,
              fillOpacity = 0.5, stroke = T, color = "#8da0cb", weight = 2) %>%
  addCircleMarkers(data = nyc_hospital_points, label = ~hsptl_n,
                   weight = 3, radius = 3, color = "#ffa500")

Procedures Performed and Costs

We also have another problem. How do you determine if a procedure is the most common? Do you sum up the number of procedures in a given year, and then rank? Or do you find the most common procedure at each hospital? There are 418 covered procedures in total and we will look at 104 of them, the top quartile basically.

procedures_count <- nyc_medicare_2017 %>%
  group_by(`DRG Definition`) %>%
  summarise(discharge_count = sum(`Total Discharges`)) %>%
  arrange(desc(discharge_count))
head(procedures_count, n = 10)
## # A tibble: 10 x 2
##    `DRG Definition`                                              discharge_count
##    <chr>                                                                   <dbl>
##  1 871 - SEPTICEMIA OR SEVERE SEPSIS W/O MV >96 HOURS W MCC                11414
##  2 470 - MAJOR JOINT REPLACEMENT OR REATTACHMENT OF LOWER EXTRE~            8640
##  3 291 - HEART FAILURE & SHOCK W MCC                                        6420
##  4 392 - ESOPHAGITIS, GASTROENT & MISC DIGEST DISORDERS W/O MCC             3094
##  5 312 - SYNCOPE & COLLAPSE                                                 2965
##  6 690 - KIDNEY & URINARY TRACT INFECTIONS W/O MCC                          2776
##  7 872 - SEPTICEMIA OR SEVERE SEPSIS W/O MV >96 HOURS W/O MCC               2618
##  8 190 - CHRONIC OBSTRUCTIVE PULMONARY DISEASE W MCC                        2198
##  9 378 - G.I. HEMORRHAGE W CC                                               2009
## 10 683 - RENAL FAILURE W CC                                                 1993

So, if we look at the most common procedures by the total number of discharges, septicemia/sepsis treatments are leading the pack.

nyc_medicare_2017_q1 <- nyc_medicare_2017 %>%
  filter(`DRG Definition` %in% unlist(procedures_count$`DRG Definition`)[1:104]) %>%
  select(`Provider Name`, `DRG Definition`, `Total Discharges`:`Average Medicare Payments`, borough, health_plus) %>%
  mutate(DRG_num = as.character(substr(`DRG Definition`, 1, 3)))
nyc_medicare_2017_top40 <- nyc_medicare_2017_q1 %>%
  filter(`DRG Definition` %in% unlist(procedures_count$`DRG Definition`)[1:40])
cost_boxplot <- ggplot(nyc_medicare_2017_top40, aes(DRG_num, `Average Total Payments`)) + 
  geom_boxplot() + 
  labs(title = "Box Plot of Average Total Payments by top 40 DRGs",
       subtitle = "Each observation is the averal total payments attributed to each hospital",
       x = "DRG Number",
       y = "Average Total Payments")
cost_boxplot

nyc_medicare_2017_q1_variances <- nyc_medicare_2017_q1 %>%
  group_by(DRG_num) %>%
  summarise(Range_Total = max(`Average Total Payments`)-min(`Average Total Payments`),
            Range_NonCovered = max(`Average Total Payments` - `Average Medicare Payments`)-min(`Average Total Payments` - `Average Medicare Payments`),
            Range_PctNonCovered = 100*(max(round(((`Average Total Payments`- `Average Medicare Payments`)/`Average Total Payments`),4)) - min(round(((`Average Total Payments`- `Average Medicare Payments`)/`Average Total Payments`),4))),
            n_procedures = sum(`Total Discharges`),
            n_hospitals = n_distinct(`Provider Name`))

total_cost_range_num_hospital_scatter <- ggplot(nyc_medicare_2017_q1_variances, aes(n_hospitals, Range_Total, label = DRG_num)) +
  geom_point() + 
  labs(title = "Scatterplot of Total Costs Ranges",
       subtitle = "Range for each DRG by the number of providers",
       x = "Number of Hospitals",
       y = "Range ($)")
ggplotly(total_cost_range_num_hospital_scatter)
pct_noncovered_range_num_hospital_scatter <- ggplot(nyc_medicare_2017_q1_variances, aes(n_hospitals, Range_PctNonCovered, label = DRG_num)) +
  geom_point() + 
  labs(title = "Scatterplot of Range of Uncovered Costs as Prop of Total Costs",
       subtitle = "Range for each DRG by the number of providers",
       x = "Number of Hospitals",
       y = "Range (%)") + 
  geom_smooth(se = F, method = "lm", color = "#b3cde3")
ggplotly(pct_noncovered_range_num_hospital_scatter)
pct_noncovered_range_num_procedure_scatter <- ggplot(nyc_medicare_2017_q1_variances, aes(n_procedures, Range_PctNonCovered, label = DRG_num)) +
  geom_point() + 
  labs(title = "Scatterplot of Range of Uncovered Costs as Prop of Total Costs",
       subtitle = "Range for each DRG by the number of procedures performed in total",
       x = "Number of Procedures Performed",
       y = "Range (%)")
ggplotly(pct_noncovered_range_num_procedure_scatter)

I want to say that using the range of costs is not very meaningful.

Costs and Hospitals

Okay, so there is a huge variance between hospitals for each of the top forty most common procedures. Further, the boxplots suggest that the distributions of costs for each DRG is are non-Normal. The next question is: are there hospitals that are consistently above or below average?

nyc_medicare_2017_q1_total_payments_zscores <- nyc_medicare_2017_q1 %>%
  group_by(DRG_num) %>%
  mutate(mean_total_payment = mean(`Average Total Payments`),
         sd_total_payment = sd(`Average Total Payments`)) %>%
  ungroup() %>%
  mutate(z_score = (`Average Total Payments`-mean_total_payment)/sd_total_payment)
hospital_total_payments_scatter <- ggplot(nyc_medicare_2017_q1_total_payments_zscores, aes(`Provider Name`, z_score, label = DRG_num)) + 
  geom_point(aes(col = health_plus)) + 
  geom_hline(yintercept = 0) +
  coord_flip() + 
  labs(title = "Average Total Payments Scatterplot",
       subtitle = "No. of std dev for each common DRG by hospital",
       y = "Standard Deviations",
       x = "Hospital")
ggplotly(hospital_total_payments_scatter)

Scandal! All NYC Health-Plus, ie public hospitals, charge more than private ones, with the exception of Coney Island Hospital. More precisely, on average, NYC Health+ hospitals are paid above-average amounts for the procedures performed. Well, I assume that Medicare does not just cover every single part of the procedure. Some parts of the entire treatment process might not be billable to Medicare. So what if we look at the Average Covered Charges?

nyc_medicare_2017_q1_covered_charges_zscores <- nyc_medicare_2017_q1 %>%
  group_by(DRG_num) %>%
  mutate(mean_covered_charges = mean(`Average Covered Charges`),
         sd_covered_charges = sd(`Average Covered Charges`)) %>%
  ungroup() %>%
  mutate(z_score = (`Average Covered Charges`-mean_covered_charges)/sd_covered_charges)
hospital_covered_charges_scatter <- ggplot(nyc_medicare_2017_q1_covered_charges_zscores, aes(`Provider Name`, z_score, label = DRG_num)) + 
  geom_point(aes(col = health_plus)) + 
  geom_hline(yintercept = 0) +
  coord_flip() + 
  labs(title = "Average Covered Charges Scatterplot",
       subtitle = "No. of std dev for each common DRG by hospital",
       y = "Standard Deviations",
       x = "Hospital")
ggplotly(hospital_covered_charges_scatter)

Evidently, this picture is a bit more mixed. So the Average Covered Charges of NYC Health+ hospitals tend to be below average whereas private that of private hospital systems tend to be above average. But if you think about it, what we are really interested in is, how much do beneficiaries (and their insurance providers) have to pay on top of what Medicare pays out. In other words, we really want to look into the co-payments, deductibles, and other additional payments from third parties for coordination of benefits.
Non Medicare Payments = Average Total Payments - Average Medicare Payments.

nyc_medicare_2017_q1_non_medicare_payments_zscores <- nyc_medicare_2017_q1 %>%
  group_by(DRG_num) %>%
  mutate(average_non_medicare_payments = `Average Total Payments` - `Average Medicare Payments`,
         mean_non_medicare_charges = mean(average_non_medicare_payments),
         sd_non_medicare_charges = sd(average_non_medicare_payments)) %>%
  ungroup() %>%
  mutate(z_score = (average_non_medicare_payments-mean_non_medicare_charges)/sd_non_medicare_charges)
hospital_non_medicare_payments_scatter <- ggplot(nyc_medicare_2017_q1_non_medicare_payments_zscores, aes(`Provider Name`, z_score, label = DRG_num)) + 
  geom_point(aes(col = health_plus)) + 
  geom_hline(yintercept = 0) +
  coord_flip() + 
  labs(title = "Average Covered Charges Scatterplot",
       subtitle = "No. of std dev for each common DRG by hospital",
       y = "Standard Deviations",
       x = "Hospital") + 
  facet_wrap(~ borough, nrow = 5, scales = "free_y")
ggplotly(hospital_non_medicare_payments_scatter)

Okay, so this presents a much more nuanced picture. Even though Total Payments might be high at NYC Health+ hospitals, the amount that is not covered by Medicare tends to be below average at NYC Health+ hospitals. Among boroughts, the hospitals in Manhattan tend to incur more non-Medicare covered charges.